DiscoverJoe Carlsmith AudioTakes on "Alignment Faking in Large Language Models"
Takes on "Alignment Faking in Large Language Models"

Takes on "Alignment Faking in Large Language Models"

Update: 2024-12-18
Share

Description

What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Takes on "Alignment Faking in Large Language Models"

Takes on "Alignment Faking in Large Language Models"

Joe Carlsmith